Design a data structure that will be initialized with a string array, and then it should answer queries of the shortest distance between two different strings from the array.
Implement the WordDistance class:
WordDistance(String[] wordsDict)initializes the object with the strings arraywordsDict.int shortest(String word1, String word2)returns the shortest distance betweenword1andword2in the arraywordsDict.
Example 1:
Input
["WordDistance", "shortest", "shortest"]
[[["practice", "makes", "perfect", "coding", "makes"]], ["coding", "practice"], ["makes", "coding"]]
Output
[null, 3, 1]
Explanation
WordDistance wordDistance = new WordDistance(["practice", "makes", "perfect", "coding", "makes"]);
wordDistance.shortest("coding", "practice"); // return 3
wordDistance.shortest("makes", "coding"); // return 1
Constraints:
1 <= wordsDict.length <= 3 * 1041 <= wordsDict[i].length <= 10wordsDict[i]consists of lowercase English letters.word1andword2are inwordsDict.word1 != word2- At most
5000calls will be made toshortest.
Average Rating: 4.36 (39 votes)
Solution
Before looking at the solution for this problem, let's look at what the problem asks us to do in simpler terms. We have to design a class which receives a list of words as input in the constructor. The class has a function which we need to implement and that function is shortest which takes two words as input and returns the minimum distance between the two as the output.
When the problem talks about the distance between two words, it essentially means the absolute gap between the indices of the two words in the list. For e.g. if the first word occurs at a location i and the second word occurs at the location j, then the distance between the two would be abs(i - j).
The question asks us to find the minimum such different between words which clearly indicates that the words can occur at multiple locations. If we have K occurrences for the word1 and L occurrences for the word2, then iteratively checking every pair of indices will give us a O(N2) algorithm which won't be optimal at all. We won't discuss that algorithm here since it is very straightforward.
The brute-force algorithm would simple consider all possible pairs of indices for (word1_location, word2_location) and see which one produces the minimum distance. Let's try and build on this idea and see if some pre-processing can help us out reduce the complexity of the brute-force algorithm.
Approach 1: Using Preprocessed Sorted Indices
Intuition
A given word can occur multiple times in the original word list. Let's suppose the first word, word1 in the input to the function shortest occurs at the indices [i1, i2, i3, i4] in the original list. Similarly, let's assume that the second word, word2, appears at the following locations inside the word list [j1, j2, j3].
Now, given these list of indices, we are to simply find the pair of indices (i, j) such that their absolute difference is minimum.
The main idea for this approach is that if the list of these indices is in sorted order, we can find such a pair in linear time.
The idea is to use a two pointer approach. Let's say we have a pointer i for the sorted list of indices of word1 and j for the sorted list of indices of word2. At every iteration, we record the difference of indices i.e. abs(word1[i] - word2[j]). Once we've done that, we have two possible choices for progressing the two pointers.
word1[i] < word2[j]
If this is the case, that means there is no point in moving the j pointer forward. The location indices for the words are in a sorted order. We know that word2[j + 1] > word2[j] because these indices are sorted. So, if we move j forward, then the difference abs(word1[i] - word2[j + 1]) would be even greater than abs(word1[i] - word2[j]). That doesn't help us since we want to find the minimum possible distance (difference) overall.
So, if we have (word1[i] < word2[j]), we move the pointer 'i' one step forward i.e. (i + 1) in the hopes that abs(word1[i + 1] - word2[j]) would give us a lower distance than abs(word1[i] - word2[j]). We say "hopes" because it is not certain this improvement would happen.
Let's look at two different examples. In the first example we will see that moving i forward gave us the best difference overall (0). In the second example we see that moving i forward leads us to our second case (yet to discuss) but doesn't lead to any improvement in the difference.
Example-1
word1_locations = [2,4,5,9]
word2_locations = [4,10,11]
i, j = 0, 0
min_diff = 2 (abs(2 - 4))
word1[i] < word2[j] i.e. 2 < 4
move i one step forward
i, j = 1, 0 (abs(4 - 4))
min_diff = 0 (We hit the jackpot!)
Example-2
word1_locations = [2,7,15,16]
word2_locations = [4,10,11]
i, j = 0, 0
min_diff = 2 (abs(2 - 4))
word1[i] < word2[j] i.e. 2 < 4
move i one step forward
i, j = 1, 0
min_diff = 2 (2 < abs(7 - 4))
Here, we did not update out global minimum difference.
That is why we said earlier, moving 'i' forward may or
may not give a lower difference. But moving 'j' forward in
our case would definitely worsen the difference (or keep it same!).
Let's move onto our second scenario.
word1[i] > word2[j]
If this is the case, that means there is no point in moving the i pointer forward. We know that word1[i + 1] > word2[j] because these indices are sorted. So, if we move i forward, then the difference abs(word1[i + 1] - word2[j]) would be even greater than abs(word1[i] - word2[j]). That doesn't help us since we want to find the minimum possible distance (difference) overall.
So, along the similar lines of thought as the previous case, if we have (word1[i] > word2[j]), we move the pointer 'j' one step forward i.e. (j + 1) in the hopes that abs(word1[i] - word2[j + 1]) would give us a lower distance than abs(word1[i] - word2[j]). We say "hopes" because as showcased in the previous scenario, it is not certain this improvement would happen.
Now let's formally look at the algorithm for solving this problem.
Algorithm
- In the
constructorof the class, we simply iterate over the given list of words and prepare a dictionary, mapping a word to all it's locations in the array. - Since we process all the words from left to right, we will get all the indices in a sorted order by default for all the words. So, we don't have to sort the indices ourselves.
- Let's call the dictionary that we build,
locations. - For a given pair of words, obtain the list of indices (appearances inside the original list/array of words). Let's call the two arrays
loc1andloc2. - Initialize two pointer variables
l1 = 0andl2 = 0. - For a given
l1andl2, we first update (if possible) the minimum difference (distance) till now i.e.dist = min(dist, abs(loc1[l1] - loc2[l2])). Then, we check ifloc1[l1] < loc2[l2]and if this is the case, we movel1one step forward i.e.l1 = l1 + 1. Otherwise, we movel2one step forward i.e.l2 = l2 + 1. - We keep doing this until all the elements in the smaller of the two location arrays are processed.
- Return the global minimum distance between the words.
This represents the locations dictionary that we should build given the original words list in the constructor. The key represents the word and the value is a list containing indices in ascending order of occurrences throughout the array. Let's look at the minimum distance between the words apple and football in the array. So, we will be considering the two sorted lists of indices: [3, 6, 8, 12] and [2, 7, 9].
Complexity analysis
- Time complexity : The time complexity of the constructor of our class is O(N) considering there were N words in the original list. We iterate over them and prepare a mapping from key to list of indices as described before. Then, for the function that finds the minimum distance between the two words, the complexity would be O(max(K,L)) where K and L represent the number of occurrences of the two words. However, K=O(N) and also L=O(N). Therefore, the overall time complexity would also be O(N). The reason the complexity is O(max(K,L)) and not O(min(K,L)) is because of the scenario where the minimum element of the smaller list is larger than
allthe elements of the larger list. In that scenario, the pointer for the smaller list will not progress at all and the one for the longer list will reach to the very end. - Space complexity: O(N) for the dictionary that we prepare in the constructor. The keys represent all the unique words in the input and the values represent all of the indices from 0...N.
thanks for your article.
i believe the function that finds the minimum distance between the 2 words should be O(k + l).
consider the following case, K = L:
word1 = [1 3 5 7 9]
word2 = [2 4 6 8 10]
we will iterate all word1 and word2 based on ur logic. thus O(k + l)
April 30, 2019 9:07 PM
just a slight mistake in the article, in the first example, word1 and word 2 seem to happen at the same index 4 which can never happen according to the problem statement
July 29, 2019 9:56 AM
This method makes sense, but why can't we just repeat the same one in Shortest Word Distance I. We just store the whole words list in the constructor, and do the linear time method to get the min distance in 'shortest' function. The overall space complexity and time complexity are both O(N), which is same as the answer above.
Can anyone explain it to me? I really want to know.
poorly stated problem. The problem definition is highly incomplete.
The description "Your method will be called repeatedly many times with different parameters." gives me an impression that it's looking for a time complexity of O(1) calling shortest() lol.
I think having a memo map to record calculated results could be practically temporally more efficient though this would lead to O(N2) spacial complexity.
March 19, 2020 12:31 PM
I used binary search with iteration for querying minimal distance, whose time complexities is
O(min(K,L)logmax(K,L)). I think it will be better in cases when L >> K.
To find the leftmost occurrence of the other word to the right of the current position of the current word we can use a binary search (upper_bound) instead of a linear search. Here is a C++ implementation that runs in 36ms (faster than 100% of other C++ submissions):
class WordDistance final {
unordered_map<string, vector<int>> positions;
public:
WordDistance(const vector<string>& words) noexcept {
for (int i = 0; i < words.size(); ++i) positions[words[i]].push_back(i);
}
int shortest(const string& word1, const string& word2) const noexcept {
auto& vector_1 = positions.find(word1)->second;
auto& vector_2 = positions.find(word2)->second;
auto current_1 = vector_1.cbegin();
auto current_2 = vector_2.cbegin();
auto end_1 = vector_1.cend();
auto end_2 = vector_2.cend();
if (*current_1 > *current_2) swap(current_1, current_2), swap(end_1, end_2);
int distance = *current_2 - *current_1;
while (true) {
current_1 = upper_bound(current_1 + 1, end_1, *current_2);
distance = min(distance, *current_2 - current_1[-1]);
if (current_1 == end_1) return distance;
distance = min(distance, *current_1 - *current_2);
swap(current_1, current_2), swap(end_1, end_2);
}
}
};
#define endl '\n'
static const auto speedup =
[]() { ios::sync_with_stdio(0); cin.tie(0); cout.tie(0); return 0; }();
April 4, 2019 11:53 AM
The code is correct, but the comment is wrong. "# Until the shorter of the two lists is processed", not necessarily. The longer list could be consumed first.
Time Submitted | Status | Runtime | Memory | Language |
|---|---|---|---|---|
| 06/19/2021 08:58 | Accepted | 32 ms | 19.6 MB | cpp |
| 06/19/2021 08:53 | Time Limit Exceeded | N/A | N/A | cpp |
xxxxxxxxxxclass WordDistance {public: unordered_map<string, vector<int>> mp; WordDistance(vector<string>& wordsDict) { for(int i=0;i<wordsDict.size();i++) { mp[wordsDict[i]].push_back(i); } } int shortest(string word1, string word2) { vector<int> l1 = mp[word1]; vector<int> l2 = mp[word2]; int i = 0, j = 0, res = INT_MAX; while(i < l1.size() && j < l2.size()) { res = min(res, abs(l1[i] - l2[j])); if(l1[i] < l2[j]) i++; else j++; } return res; }};/** * Your WordDistance object will be instantiated and called as such: * WordDistance* obj = new WordDistance(wordsDict); * int param_1 = obj->shortest(word1,word2); */